Cream of the Crop 1

home *** CD-ROM | disk | FTP | other *** search

/ Cream of the Crop 1 / Cream of the Crop 1.iso / EDITOR / KDP32_1.ARJ / DICMERGE.DOC < prev next >

Wrap

Text File | 1992-04-24 | 9KB | 178 lines

DICMERGE.EXE (VERSION 2) DICTIONARY MAINTENANCE FOR JWRITE 1. Introduction. JWRITE has no built-in functions for the maintenance of the dictionary. This is nevertheless a desirable feature to have. If the standard dictionary does not contain kanji equivalents of words which you frequently need in your field of business, it would be nice to be able to add them to the dictionary. This IS possible, using the enclosed program DICMERGE.EXE, but I must warn you that this is a rather complicated operation. Please follow these instructions carefully. It is advisable to copy the dictionary (WNNSJIS.DIC) and its index file (WNNSJIS.IND) to somewhere for safekeeping. 2. Dictionary files A dictionary file like WNNSJIS.DIC consists of lines of SJIS coded text. Every line consists of the following elements, from left to right: a- the KEYWORD, which must be in ascii (hankaku romaji) or hiragana. Katakana keywords are not allowed. From version 2, the keyword is allowed to be a "key phrase" which may include spaces. The lines may in general not begin with a space; this is considered a "sort violation" and causes the program to abort (see below). b- one SPACE (don't forget this!) c- one RIGHT SLASH (/). d- one or more possible TRANSLATIONS for the keyword. The translations may be written in any character type, katakana, hiragana, kanji, big or small ascii, or special characters. Every translation, including the last one, must be followed by a right slash. Any text after the last slash will be ignored. e- a LINE FEED CHARACTER for signalling the end of the line. It would be possible to have carriage return - line feed combinations at the end of each line, but in a dictionary containing tens of thousands of lines, that would just be tens of thousands of extra bytes. An example of a dictionary line is: é½é▒éñ /è±ì`/ï@ì\/ïCî≤/ In other words, the dictionary is just a text file which can be read and edited (in principle) by JWRITE itself. In theory, JWRITE could be directly used for maintenance work on its own dictionary. Unfortunately, the size of the dictionary file makes this impossible in practice. The dictionary file cannot possibly fit in memory. (You can try loading it with JWRITE WNNSJIS.DIC, to see the beginning of the dictionary, but please do not actually try to change anything. You can also view, but not edit, the entire dictionary if you use Vernon Buergs LIST.COM, version 7.5i. Use the /B switch to let LIST run under KDPLUS. If you use the KDPLUS keyboard input utility KJIN, you can even look up words in the dictionary using LIST.) However, you can add new information to the dictionary by making a small dictionary for yourself, containing the information that you want to add, and merging it with the existing dictionary. This can be done with the program DICMERGE. Notice that it is only possible to ADD to the dictionary this way. You cannot remove anything from it (not with this utility, anyway. Utilities for removing dictionary lines can, however, be made. They should overwrite the un-needed lines with spaces; a DICMERGE operation on the file will then re-create a valid index). 3. Making an update dictionary You make your update dictionary as a text file, using the rules a-d specified above (rule e is not important for the update file, because DICMERGE will convert CR/LF combinations to single LF's). You can add completely new keywords with their translations, and also new translations for existing keywords. For instance, the present version of the WNNSJIS.DIC has only one translation for the keyword é┐éπéñéó, namely Æìê╙ . Now imagine that you often need military terms, and you want to have the word Æåê╤ (also pronounced é┐éπéñéó) in the dictionary as well. Your update text (call it, for instance, PRIVATE.DIC) must then contain the line é┐éπéñéó /Æåê╤/ (Because Æåê╤ is not in the dictionary yet, you must construct the word from the separate kanji Æå and ê╤, which are in the dictionary as single kanji, to be found through their pronunciations é┐éπéñ and éó). Your update text may contain many such lines. It is IMPORTANT (in fact this is the most important and the most difficult bit of the whole operation) that the file be SORTED: -the lines with alphabetical keyword must come before the lines with hiragana keyword -the lines with alphabetical keyword must be in alphabetical order (in fact, in standard ASCII order. Look at any ASCII table). -the lines with hiragana keyword must be in Japanese kana order (éá-éó-éñ-éª-é¿-é⌐-é½-é¡-é»....etc.) When you have finished, save the file. If you're not sure about the sorting, use the DOS SORT utility. 4. Merging with the existing dictionary Now go to DOS and type DICMERGE Type in the names of the 2 dictionaries: file 1 is PRIVATE.DIC, file 2 is the existing dictionary, WNNSJIS.DIC. The new (merged) dictionary will be called MERGE.DIC; at the same time an index file will be made for it, MERGE.IND. MERGE.DIC will separate lines by line feeds only, no carriage returns. If you type in only one dictionary name, and just press ENTER when asked for the other one, DICMERGE will still run; its only function will then be to re-create the index for the one dictionary that you specified. (You might need this if the index file has become lost or corrupted). The merge process will take some time (a minute or so, for a large dictionary). You can follow its progress on your screen, as new entries are constructed for the index (this works best when you are in the KDPLUS environment). Lines which are illegally-formed (e.g. lines with only one slash in them, or lines which begin with a katakana or a kanji) are discarded. The program will warn you, but continue with the merge process. However, lines which are otherwise legally-formed but are not in proper sorted order will cause the program to abort, displaying the location of the "sort violation". You can then try to correct the situation before proceeding to the next step. If the program halts for that reason, there will be no MERGE.DIC and MERGE.IND files generated (for your protection). When the merge process is finished, you must enter the following commands: del wnnsjis.* (I hope you saved the old version somewhere..) ren merge.* wnnsjis.* From that moment, the new keywords and new meanings have been added to the dictionary, and are accessible by means of the ALT-L function of JWRITE. If there were "merged" entries (the same keyword occurring in both input dictionaries but with different translations) the translations from the first input dictionary will be listed first on the corresponding line of the output dictionary. 5. A tip. Here and there in public domain sources you can find dictionary files. If you are sure that they conform to the rules mentioned in section 2, you can merge them with your existing dictionary to increase its capabilities. There will be a penalty: the bigger the dictionary, the slower it will be in general. Test it with a "slow word", like ÉVò╖ (the position of this word in the list is such that looking for it will take some time, more than a second). 6. New features in version 2. It is now possible to do a DICMERGE on only one dictionary (just press ENTER when asked for the other one). This will re-create the index. The program performs more stringent checking on the dictionary lines, reducing the chances of destroying your dictionary by merging it with a file containing illegal lines. It is now possible (in principle) to delete lines from the dictionary by overwriting them with spaces. "Key phrases" (with spaces in them) are now allowed. However, the lookup mechanism of JWRITE will not recognize them unless your version of JWRITE is 1.5 or higher. Katakana keywords are now detected and discarded. In the previous version it seemed that you could get away with including katakana keywords (if you put them at the end of the update dictionary), but in fact each katakana entry would make some alphabetic entries inaccessible by destroying the index pointers to them. 7. Note. A file NUMBERS.DIC, with which you can extend the dictionary with novel number symbols like çD and ç[, has been provided in this archive for test purposes. Tokyo, 5 January 1992 (Revised 3 March 1992) Jan W. Stumpel